A Simple Ensemble Method for Hedge Identification

نویسندگان

  • Ferenc Szidarovszky
  • Illés Solt
  • Domonkos Tikk
چکیده

We present in this paper a simple hedge identification method and its application on biomedical text. The problem at hand is a subtask of CoNLL-2010 shared task. Our solution consists of two classifiers, a statistical one and a CRF model, and a simple combination schema that combines their predictions. We report in detail on each component of our system and discuss the results. We also show that a more sophisticated combination schema could improve the F-score significantly. 1 Problem definition The CoNLL-2010 Shared Task focused on the identification and localization of uncertain information and its scope in text. In the first task, a binary classification of sentences had to be performed, based on whether they are uncertain or not. The second task concentrated on the identification of the source of uncertainty – specifying the keyword/phrase that makes its context uncertain –, and the localization of its scope. The organizers provided training data from two application domains: biomedical texts and Wikipedia articles. For more details see the overview paper by the organizers (Farkas et al., 2010). We focused on task 1 and worked with biomedical texts exclusively. The biomedical training corpus contains selected abstracts and full text articles from the BioScope corpus (Vincze et al., 2008). The corpus was manually annotated for hedge cues on the phrase level. Sentences containing at least one cue are considered as uncertain, while sentences with no cues are considered as factual. Though cue tagging was given in the training data, their marking in the submission was not mandatory. The evaluation of systems at task 1 was performed on the sentence level with the F-measure of the uncertain class being the official evaluation metric. For evaluation, corpora also from both domains were provided that allowed for in-domain and cross-domain experiments as well. Nevertheless, we restricted the scope of our system to the in-domain biomedical subtask.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hedge Detection Using the RelHunter Approach

RelHunter is a Machine Learning based method for the extraction of structured information from text. Here, we apply RelHunter to the Hedge Detection task, proposed as the CoNLL-2010 Shared Task1. RelHunter’s key design idea is to model the target structures as a relation over entities. The method decomposes the original task into three subtasks: (i) Entity Identification; (ii) Candidate Relatio...

متن کامل

Development of an Ensemble Multi-stage Machine for Prediction of Breast Cancer Survivability

Prediction of cancer survivability using machine learning techniques has become a popular approach in recent years. ‎In this regard, an important issue is that preparation of some features may need conducting difficult and costly experiments while these features have less significant impacts on the final decision and can be ignored from the feature set‎. ‎Therefore‎, ‎developing a machine for p...

متن کامل

Simulation of Boiling in a Vertical Channel Using Ensemble Average Model

Simulation of turbulence boiling, generation of vapour and predication of its behaviour are still subject to debate in the two-phase flow area and they receive a high level of worldwide attention. In this study, a new arrangement of the three dimensional governing equations for turbulence two-phase flow with heat and mass transfer are derived by using ensemble averaging two-fluid model and ...

متن کامل

Study on Gold as a Hedge or Safe Haven for the Stock Market by a Markov Switching Approach

Although gold is no longer a central cornerstone of the international monetary and financial system, it still attracts considerable attention from researchers and investors. Nowadays, many investors manage their risk with valuable assets such as gold. This paper examines the dynamic relationships between gold and stock markets in the Tehran Stock Exchange. We have applied the Markov switching m...

متن کامل

Detecting uncertainty in biomedical literature: a simple disambiguation approach using sparse random indexing

This paper presents a novel approach to the problem of hedge detection, which involves the identification of so-called hedge cues for labeling sentences as certain or uncertain. This is the classification problem for Task 1 of the CoNLL-2010 Shared Task, which focuses on hedging in biomedical literature. We here propose to view hedge detection as a simple disambiguation problem, restricted to w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010